InfoMagic Internet Tools 1995 April

home *** CD-ROM | disk | FTP | other *** search

/ InfoMagic Internet Tools 1995 April / Internet Tools.iso / infoserv / www / cern / doc / www-talk.archive.Z / www-talk.archive / text0159.txt < prev next >

Wrap

Text File | 1992-11-30 | 5.6 KB | 144 lines

Dan, you say << I suppose you could come up with a DTD that describes something close to the current HTML, but I'm not sure of the value of it. HTML allows tags to be pretty much sprinkled wherever you feel like putting them. Any DTD that allows that much leeway just looks like this: <!ENTITY % alltags "TITLE|H1|H2|H3|MENU|OL|UL"> <!ELEMENT %alltags (%alltags)*> i.e. every element is just a repeatable or-group of all the elements. Then the SGML parser can't do any minimization cuz nothing's required. >> Yes, current SGML currently is just a linear sequence of elements. (Sorry, current HTML -- I'm typing this in serially and can't edit!). There is a reason for this: it is very convenient for HTML to map onto a series of styles -- for two reasons. Firstly, a lot of rich text objects can hold styles but can't hold structure. You can deduce structure from the styles -- like Word deucing outlining from Heading styles, and WWW deducing a list <UL> from a lot of <LI> paragraphs. But you can't go very far. If you want to make a HT editor out of such a text object, you ahve to regenerate the elements from the styles. Secondly, it may be that the wysiwyg editors have a linear style structure because that is intuitive to people. I don't know a lot of people who use author/editor (which maintains structure). Maybe real people actually think in terms of styles and fix the document to look right, then they are happy to have the structure deduced. So if we went for a nestable HTML which would be cleaner for those who apreciate recursion, we would have to have a hypertext editor which made the structure visible. I don't have experience enough to know whether real information providers (group secretaries, for example) would be into generating nested elements -- maybe the styles are useful to keep as the current `user interface metaphor' of word processors. (It also makes making the editor easier!) Or maybe we should have two levels of DTD -- one basically linear and mandatory (and precompiled for fast access) and one more sophisticated for larger documents. Of course, when you are writing hypertext the large documents are normally broken down into small bits to make traveing them quick. So whereas each hypertext node may contain only H1 and H2 headings, when a book is generated a la the_www_book.ps you get 5 levels of heading from the whole tree. So that is why the HTML strcuture is so simple. I am open to a more sophisticated alternative. Tim ____________________________ From connolly@pixel.convex.com Fri Jun 26 00:00:33 1992 Return-Path: <connolly@pixel.convex.com> Received: from dxmint.cern.ch by nxoc01.cern.ch (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0) id AA02722; Fri, 26 Jun 92 00:00:27 MET DST Received: by dxmint.cern.ch (dxcern) (5.57/3.14) id AA25540; Fri, 26 Jun 92 00:00:11 +0200 Received: from pixel.convex.com by convex.convex.com (5.64/1.35) id AA10700; Thu, 25 Jun 92 17:00:01 -0500 Received: from localhost by pixel.convex.com (5.64/1.28) id AA05209; Thu, 25 Jun 92 17:00:00 -0500 Message-Id: <9206252200.AA05209@pixel.convex.com> To: timbl@nxoc01.cern.ch (Tim Berners-Lee) Subject: Re: HTML DTD In-Reply-To: Your message of "Thu, 25 Jun 92 23:07:25 +0700." <9206252107.AA02534@ nxoc01.cern.ch > Date: Thu, 25 Jun 92 16:59:59 CDT From: Dan Connolly <connolly@pixel.convex.com> Status: R >thanks for that contribution. Not being as hot on SGML >as I ought to be, I don't see why the HREF has to refer to >and entity declared separately rather than directly having >a string argument. > That's actually left over from when I was trying to point HREF attributes to MIME attachments. It's not really necessary to move the UDIs into entities as long as you're careful that the UDI syntax is a subset of the SGML attribute literal syntax. Beware, for example, that an SGML parser will expand entity references in an attribute literal to produce the CDATA for the attribute value. So that <A HREF="A&P"> might be OK for the linemode browser, but an SGML parser will try to resolve &P. Also, SGML attribute values have a maximum length specified in the SGML declaration. The default value is 960 or something around there. >The title is in fact optional currently, by the way ... >we could keep it so though it "ought" always to have one. > >I'd like a DTD which as closely reflects the current HTML as >possible. I suppose you could come up with a DTD that describes something close to the current HTML, but I'm not sure of the value of it. HTML allows tags to be pretty much sprinkled wherever you feel like putting them. Any DTD that allows that much leeway just looks like this: <!ENTITY % alltags "TITLE|H1|H2|H3|MENU|OL|UL"> <!ELEMENT %alltags (%alltags)*> i.e. every element is just a repeatable or-group of all the elements. Then the SGML parser can't do any minimization cuz nothing's required. > Then, if we change HTML to HTML2, I would >change it in a number of ways, in particular to include >separate header and body parts. I have come across the >"Davenport" group of publishers who are defineing DTDs for >technical documentation. They include Steve Newcombe who >is the HyTime guy (or one of the two I should say). >I would like to get some input from them. > Certainly we should keep tabs on things like the Davenport group and HyTime. But my immediate concern is these little sytactic differences that render HTML documents worthless to an SGML parser. The current HTML and UDI syntax make a good proof of concept, but we need to move toward formal definitions so that we can have confidence that correct implementations will interoperate. More later... Dan